AAAI.2024 - Student Abstract and Poster Program

Total: 123

#1 Multipartite Entity Resolution: Motivating a K-Tuple Perspective (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Adin Aberbach ; Mayank Kejriwal ; Ke Shen

Entity Resolution (ER) is the problem of algorithmically matching records, mentions, or entries that refer to the same underlying real-world entity. Traditionally, the problem assumes (at most) two datasets, between which records need to be matched. There is considerably less research in ER when k > 2 datasets are involved. The evaluation of such multipartite ER (M-ER) is especially complex, since the usual ER metrics assume (whether implicitly or explicitly) k < 3. This paper takes the first step towards motivating a k-tuple approach for evaluating M-ER. Using standard algorithms and k-tuple versions of metrics like precision and recall, our preliminary results suggest a significant difference compared to aggregated pairwise evaluation, which would first decompose the M-ER problem into independent bipartite problems and then aggregate their metrics. Hence, M-ER may be more challenging and warrant more novel approaches than current decomposition-based pairwise approaches would suggest.

#2 Preference-Aware Constrained Multi-Objective Bayesian Optimization (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Alaleh Ahmadianshalchi ; Syrine Belakaria ; Janardhan Rao Doppa

We consider the problem of constrained multi-objective optimization over black-box objectives, with user-defined preferences, with a largely infeasible input space. Our goal is to approximate the optimal Pareto set from the small fraction of feasible inputs. The main challenges include huge design space, multiple objectives, numerous constraints, and rare feasible inputs identified only through expensive experiments. We present PAC-MOO, a novel preference-aware multi-objective Bayesian optimization algorithm to solve this problem. It leverages surrogate models for objectives and constraints to intelligently select the sequence of inputs for evaluation to achieve the target goal.

#3 Incorporating Serverless Computing into P2P Networks for ML Training: In-Database Tasks and Their Scalability Implications (Student Abstract) [PDF1] [Copy] [Kimi]

Author: Amine Barrak

Distributed ML addresses challenges from increasing data and model complexities. Peer to peer (P2P) networks in distributed ML offer scalability and fault tolerance. However, they also encounter challenges related to resource consumption, and communication overhead as the number of participating peers grows. This research introduces a novel architecture that combines serverless computing with P2P networks for distributed training. Serverless computing enhances this model with parallel processing and cost effective scalability, suitable for resource-intensive tasks. Preliminary results show that peers can offload expensive computational tasks to serverless platforms. However, their inherent statelessness necessitates strong communication methods, suggesting a pivotal role for databases. To this end, we have enhanced an in memory database to support ML training tasks.

#4 Sleep-Like Unsupervised Replay Improves Performance When Data Are Limited or Unbalanced (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Anthony Bazhenov ; Pahan Dewasurendra ; Giri Krishnan ; Jean Erik Delanois

The performance of artificial neural networks (ANNs) degrades when training data are limited or imbalanced. In contrast, the human brain can learn quickly from just a few examples. Here, we investigated the role of sleep in improving the performance of ANNs trained with limited data on the MNIST and Fashion MNIST datasets. Sleep was implemented as an unsupervised phase with local Hebbian type learning rules. We found a significant boost in accuracy after the sleep phase for models trained with limited data in the range of 0.5-10% of total MNIST or Fashion MNIST datasets. When more than 10% of the total data was used, sleep alone had a slight negative impact on performance, but this was remedied by fine-tuning on the original data. This study sheds light on a potential synaptic weight dynamics strategy employed by the brain during sleep to enhance memory performance when training data are limited or imbalanced.

#5 Coalition Formation for Task Allocation Using Multiple Distance Metrics (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Tuhin Kumar Biswas ; Avisek Gupta ; Narayan Changder ; Redha Taguelmimt ; Samir Aknine ; Samiran Chattopadhyay ; Animesh Dutta

Simultaneous Coalition Structure Generation and Assignment (SCSGA) is an important research problem in multi-agent systems. Given n agents and m tasks, the aim of SCSGA is to form m disjoint coalitions of n agents such that between the coalitions and tasks there is a one-to-one mapping, which ensures each coalition is capable of accomplishing the assigned task. SCSGA with Multi-dimensional Features (SCSGA-MF) extends the problem by introducing a d-dimensional vector for each agent and task. We propose a heuristic algorithm called Multiple Distance Metric (MDM) approach to solve SCSGA-MF. Experimental results confirm that MDM produces near optimal solutions, while being feasible for large-scale inputs within a reasonable time frame.

#6 The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers (Student Abstract) [PDF] [Copy] [Kimi]

Author: Rickard Brännvall

To enhance the computational efficiency of quantized Transformers, we replace the dot-product and Softmax-based attention with an alternative mechanism involving addition and ReLU activation only. This side-steps the expansion to double precision often required by matrix multiplication and avoids costly Softmax evaluations but maintains much of the core functionality of conventional dot-product attention. It can enable more efficient execution and support larger quantized Transformer models on resource-constrained hardware or alternative arithmetic systems like homomorphic encryption. Training experiments on four common benchmark tasks show test set prediction scores comparable to those of conventional Transformers with dot-product attention. Our scaling experiments also suggest significant computational savings, both in plaintext and under encryption. In particular, we believe that the ReLU and addition-based attention mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the costly multiplication of encrypted variables.

#7 JoLT: Jointly Learned Representations of Language and Time-Series for Clinical Time-Series Interpretation (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Yifu Cai ; Arvind Srinivasan ; Mononito Goswami ; Arjun Choudhry ; Artur Dubrawski

Time-series and text data are prevalent in healthcare and frequently co-exist, yet they are typically modeled in isolation. Even studies that jointly model time-series and text, do so by converting time-series to images or graphs. We hypothesize that explicitly modeling time-series jointly with text can improve tasks such as summarization and question answering for time-series data, which have received little attention so far. To address this gap, we introduce JoLT to jointly learn desired representations from pre-trained time-series and text models. JoLT utilizes a Querying Transformer (Q-Former) to align the time-series and text representations. Our experiments on a large real-world electrocardiography dataset for medical time-series summarization show that JoLT outperforms state-of-the-art image captioning approaches.

#8 Data-Driven Discovery of Design Specifications (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Angela Chen ; Nicholas Gisolfi ; Artur Dubrawski

Ensuring a machine learning model’s trustworthiness is crucial to prevent potential harm. One way to foster trust is through the formal verification of the model’s adherence to essential design requirements. However, this approach relies on well-defined, application-domain-centric criteria with which to test the model, and such specifications may be cumbersome to collect in practice. We propose a data-driven approach for creating specifications to evaluate a trained model effectively. Implementing this framework allows us to prove that the model will exhibit safe behavior while minimizing the false-positive prediction rate. This strategy enhances predictive accuracy and safety, providing deeper insight into the model’s strengths and weaknesses, and promotes trust through a systematic approach.

#9 Interpreting Temporal Knowledge Graph Reasoning (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Bin Chen ; Kai Yang ; Wenxin Tai ; Zhangtao Cheng ; Leyuan Liu ; Ting Zhong ; Fan Zhou

Temporal knowledge graph reasoning is an essential task that holds immense value in diverse real-world applications. Existing studies mainly focus on leveraging structural and sequential dependencies, excelling in tasks like entity and link prediction. However, they confront a notable interpretability gap in their predictions, a pivotal facet for comprehending model behavior. In this study, we propose an innovative method, LSGAT, which not only exhibits remarkable precision in entity predictions but also enhances interpretability by identifying pivotal historical events influencing event predictions. LSGAT enables concise explanations for prediction outcomes, offering valuable insights into the otherwise enigmatic "black box" reasoning process. Through an exploration of the implications of the most influential events, it facilitates a deeper understanding of the underlying mechanisms governing predictions.

#10 The Language Model Can Have the Personality: Joint Learning for Personality Enhanced Language Model (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Tianyi Chen ; Feiqi Cao ; Yihao Ding ; Caren Han

With the introduction of large language models, chatbots are becoming more conversational to communicate effectively and capable of handling increasingly complex tasks. To make a chatbot more relatable and engaging, we propose a new language model idea that maps the human-like personality. In this paper, we propose a systematic Personality-Enhanced Language Model (PELM) approach by using a joint learning mechanism of personality classification and language generation tasks. The proposed PELM leverages a dataset of defined personality typology, Myers-Briggs Type Indicator, and produces a Personality-Enhanced Language Model by using a joint learning and cross-teaching structure consisting of a classification and language modelling to incorporate personalities via both distinctive types and textual information. The results show that PELM can generate better personality-based outputs than baseline models.

#11 MapLE: Matching Molecular Analogues Promptly with Low Computational Resources by Multi-Metrics Evaluation (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Xiaojian Chen ; Chuyue Liao ; Yanhui Gu ; Yafei Li ; Jinlan Wang ; Yi Chen ; Masaru Kitsuregawa

Matching molecular analogues is a computational chemistry and bioinformatics research issue which is used to identify molecules that are structurally or functionally similar to a target molecule. Recent studies on matching analogous molecules have predominantly concentrated on enhancing effectiveness, often sidelining computational efficiency, particularly in contexts of low computational resources. This oversight poses challenges in many real applications (e.g., drug discovery, catalyst generation and so forth). To tackle this issue, we propose a general strategy named MapLE, aiming to promptly match analogous molecules with low computational resources by multi-metrics evaluation. Experimental evaluation conducted on a public biomolecular dataset validates the excellent and efficient performance of the proposed strategy.

#12 Dual Mapping of 2D StyleGAN for 3D-Aware Image Generation and Manipulation (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Zhuo Chen ; Haimei Zhao ; Chaoyue Wang ; Bo Yuan ; Xiu Li

3D-aware GANs successfully solve the problem of 3D-consistency generation and furthermore provide a 3D shape of the generated object. However, the application of the volume renderer disturbs the disentanglement of the latent space, which makes it difficult to manipulate 3D-aware GANs and lowers the image quality of style-based generators. In this work, we devise a dual-mapping framework to make the generated images of pretrained 2D StyleGAN consistent in 3D space. We utilize a tri-plane representation to estimate the 3D shape of the generated object and two mapping networks to bridge the latent space of StyleGAN and the 3D tri-plane space. Our method does not alter the parameters of the pretrained generator, which means the interpretability of latent space is preserved for various image manipulations. Experiments show that our method lifts the 3D awareness of pretrained 2D StyleGAN to 3D-aware GANs and outperforms the 3D-aware GANs in controllability and image quality.

#13 STViT: Improving Self-Supervised Multi-Camera Depth Estimation with Spatial-Temporal Context and Adversarial Geometry Regularization (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Zhuo Chen ; Haimei Zhao ; Bo Yuan ; Xiu Li

Multi-camera depth estimation has recently garnered significant attention due to its substantial practical implications in the realm of autonomous driving. In this paper, we delve into the task of self-supervised multi-camera depth estimation and propose an innovative framework, STViT, featuring several noteworthy enhancements: 1) we propose a Spatial-Temporal Transformer to comprehensively exploit both local connectivity and the global context of image features, meanwhile learning enriched spatial-temporal cross-view correlations to recover 3D geometry. 2) to alleviate the severe effect of adverse conditions, e.g., rainy weather and nighttime driving, we introduce a GAN-based Adversarial Geometry Regularization Module (AGR) to further constrain the depth estimation with unpaired normal-condition depth maps and prevent the model from being incorrectly trained. Experiments on challenging autonomous driving datasets Nuscenes and DDAD show that our method achieves state-of-the-art performance.

#14 Simple Orthogonal Graph Representation Learning (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Taoyong Cui ; Yuhan Dong

Graph neural networks (GNNs) have attracted significant interest recently since they can effectively process and analyze graph-structured data commonly found in real-world applications. However, the predicament that GNNs are difficult to train becomes worse as the layers increase. The essence of this problem is that stacking layers will reduce the stability of forward propagation and gradient back-propagation. And as the increasing scale of models (measured by the number of parameters), how to efficiently and effectively adapt it to particular downstream tasks becomes an intriguing research issue. In this work, motivated by the effect of orthogonality constraints, we propose a simple orthogonal training framework to impose the orthogonality constraints on GNNs, which can help models find a solution vector in a specific low dimensional subspace and stabilize the signaling processes at both the forward and backward directions. Specifically, we propose a novel polar decomposition-based orthogonal initialization (PDOI-R) algorithm, which can identify the low intrinsic dimension within the Stiefel Manifold and stabilize the training process. Extensive experiments demonstrate the effectiveness of the proposed method in multiple downstream tasks, showcasing its generality. The simple method can help existing state-of-the-art models achieve better performance.

#15 Contrastive Learning for Low-Light Raw Denoising (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Taoyong Cui ; Yuhan Dong

Image/video denoising in low-light scenes is an extremely challenging problem due to limited photon count and high noise. In this paper, we propose a novel approach with contrastive learning to address this issue. Inspired by the success of contrastive learning used in some high-level computer vision tasks, we bring in this idea to the low-level denoising task. In order to achieve this goal, we introduce a new denoising contrastive regularization (DCR) to exploit the information of noisy images and clean images. In the feature space, DCR makes the denoised image closer to the clean image and far away from the noisy image. In addition, we build a new feature embedding network called Wnet, which is more effective to extract high-frequency information. We conduct the experiments on a real low-light dataset that captures still images taken on a moonless clear night in 0.6 millilux and videos under starlight (no moon present). The results show that our method can achieve a higher PSNR and better visual quality compared with existing methods.

#16 Strategic Recommendation: Revenue Optimal Matching for Online Platforms (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Luca D'Amico-Wong ; Gary Qiurui Ma ; David Parkes

We consider a platform in a two-sided market with unit-supply sellers and unit-demand buyers. Each buyer can transact with a subset of sellers it knows off platform and another seller that the platform recommends. Given the choice of sellers, transactions and prices form a competitive equilibrium. The platform selects one seller for each buyer, and charges a fixed percentage of prices to all transactions that it recommends. The platform seeks to maximize total revenue. We show that the platform's problem is NP-hard, even when each buyer knows at most two buyers off platform. Finally, when each buyer values all sellers equally and knows only one buyer off platform, we provide a polynomial time algorithm that optimally solves the problem.

#17 Improving Faithfulness in Abstractive Text Summarization with EDUs Using BART (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Narjes Delpisheh ; Yllias Chali

Abstractive text summarization uses the summarizer’s own words to capture the main information of a source document in a summary. While it is more challenging to automate than extractive text summarization, recent advancements in deep learning approaches and pre-trained language models have improved its performance. However, abstractive text summarization still has issues such as unfaithfulness. To address this problem, we propose a new approach that utilizes important Elementary Discourse Units (EDUs) to guide BART-based text summarization. Our approach showed the improvement in truthfulness and source document coverage in comparison to some previous studies.

#18 Scene Flow Prior Based Point Cloud Completion with Masked Transformer (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Junzhe Ding ; Yufei Que ; Jin Zhang ; Cheng Wu

It is necessary to explore an effective point cloud completion mechanism that is of great significance for real-world tasks such as autonomous driving, robotics applications, and multi-target tracking. In this paper, we propose a point cloud completion method using a self-supervised transformer model based on the contextual constraints of scene flow. Our method uses the multi-frame point cloud context relationship as a guide to generate a series of token proposals, this priori condition ensures the stability of the point cloud completion. The experimental results show that the method proposed in this paper achieves high accuracy and good stability.

#19 Kepler Light Curve Classification Using Deep Learning and Markov Transition Field (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Shane Donnelly ; Ayan Dutta

An exoplanet is a planet, which is not a part of our solar system. Whether life exists in one or more of these exoplanets has fascinated humans for centuries. NASA’s Kepler Space Telescope has discovered more than 70% of known exoplanets in our universe. However, manually determining whether a Kepler light curve indicates an exoplanet or not becomes infeasible with the large volume of data. Due to this, we propose a deep learning-based strategy to automatically classify a Kepler light curve. More specifically, we first convert the light curve time series into its corresponding Markov Transition Field (MTF) image and then classify it. Results show that the accuracy of the proposed technique is 99.39%, which is higher than all current state-of-the-art approaches.

#20 Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Danilo Dordevic ; Vukasin Bozic ; Joseph Thommes ; Daniele Coppola ; Sidak Pal Singh

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these ”attentionless Transformers” to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.

#21 A SAT + Computer Algebra System Verification of the Ramsey Problem R(3, 8) (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Conor Duggan ; Zhengyu Li ; Curtis Bright ; Vijay Ganesh

The Ramsey problem R(3,8) asks for the smallest n such that every red/blue coloring of the complete graph on n vertices must contain either a blue triangle or a red 8-clique. We provide the first certifiable proof that R(3,8) = 28, automatically generated by a combination of Boolean satisfiability (SAT) solver and a computer algebra system (CAS). This SAT+CAS combination is significantly faster than a SAT-only approach. While the R(3,8) problem was first computationally solved by McKay and Min in 1992, it was not a verifiable proof. The SAT+CAS method that we use for our proof is very general and can be applied to a wide variety of combinatorial problems.

#22 PICSR: Prototype-Informed Cross-Silo Router for Federated Learning (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Eric Enouen ; Sebastian Caldas ; Mononito Goswami ; Artur Dubrawski

Federated Learning is an effective approach for learning from data distributed across multiple institutions. While most existing studies are aimed at improving predictive accuracy of models, little work has been done to explain knowledge differences between institutions and the benefits of collaboration. Understanding these differences is critical in cross-silo federated learning domains, e.g., in healthcare or banking, where each institution or silo has a different underlying distribution and stakeholders want to understand how their institution compares to their partners. We introduce Prototype-Informed Cross-Silo Router (PICSR) which utilizes a mixture of experts approach to combine local models derived from multiple silos. Furthermore, by computing data similarity to prototypical samples from each silo, we are able to ground the router’s predictions in the underlying dataset distributions. Experiments on a real-world heart disease prediction dataset show that PICSR retains high performance while enabling further explanations on the differences among institutions compared to a single black-box model.

#23 Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Yimeng Fan ; Pedram Agand ; Mo Chen ; Edward J. Park ; Allison Kennedy ; Chanwoo Bae

The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a passenger vessel in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the vessel's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at https://github.com/pagand/model_optimze_vessel/tree/AAAI.

#24 Local Consistency Guidance: Personalized Stylization Method of Face Video (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Wancheng Feng ; Yingchao Liu ; Jiaming Pei ; Wenxuan Liu ; Chunpeng Tian ; Lukun Wang

Face video stylization aims to convert real face videos into specified reference styles. While one-shot methods perform well in single-image stylization, ensuring continuity between frames and retaining the original facial expressions present challenges in video stylization. To address these issues, our approach employs a personalized diffusion model with pixel-level control. We propose Local Consistency Guidance(LCG) strategy, composed of local-cross attention and local style transfer, to ensure temporal consistency. This framework enables the synthesis of high-quality stylized face videos with excellent temporal continuity.

#25 Potential-Based Reward Shaping for Intrinsic Motivation (Student Abstract) [PDF] [Copy] [Kimi]

Authors: Grant C. Forbes ; David L. Roberts

Recently there has been a proliferation of intrinsic motivation (IM) reward shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we show preserves the set of optimal policies under a more general set of functions than has been previously demonstrated. We also present Potential-Based Intrinsic Motivation (PBIM), a method for converting IM rewards into a potential-based form that are useable without altering the set of optimal policies. Testing in the MiniGrid DoorKey environment, we demonstrate that PBIM successfully prevents the agent from converging to a suboptimal policy and can speed up training.